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Abstract 

The  need  to  discover  patterns  in  spatio-temporal  (ST) 
data  has  driven  much  recent  research  in  ST  cooccurrence 
patterns.  Early  work  focused  on  discovering  spatial  pat¬ 
terns  such  as  co-location  without  examining  the  develop¬ 
ment  of  patterns  over  time  or  the  temporal  aspect  of  ST 
datasets.  This  paper  describes  a  novel  set  of  cooccurrence 
patterns  called  mixed-drove  co-occurrence  patterns  (MD- 
COPs).  They  represent  subsets  of  two  or  more  different 
ST  object-types  whose  instances  are  close  to  each  other 
both  spatially  and  temporally.  However,  mining  MDCOPs 
is  computationally  very  expensive  due  to  complex  interest 
measures,  larger  archived  and  historical  datasets,  and  ex¬ 
ponential  growth  in  candidate  patterns  with  the  number  of 
object-types.  We  propose  a  monotonic  composite  interest 
measure  for  discovering  MDCOPs  and  two  novel  MDCOP 
mining  algorithms.  Analytical  results  show  that  the  pro¬ 
posed  algorithms  are  correct  and  complete.  Experimental 
results  also  show  that  the  proposed  methods  are  computa¬ 
tionally  more  efficient  than  naive  alternatives. 


1.  Introduction 

The  Army  manages  huge  amounts  of  spatio-temporal 
(ST)  data  from  a  multitude  of  databases,  and  the  volume 
of  such  data  continues  to  expand  as  database  archives  grow 
and  ST  sensors  increase  in  number  and  resolution.  This 
data  is  stored  in  attributes,  values  and  tables,  with  impor¬ 
tant  relationships  and  patterns  that  are  not  known.  Hu¬ 
mans  have  limited  capacities  to  find  these  patterns  with 
analysis  and  knowledge  discovery,  making  automated  and 
semi-automated  pattern  analysis  essential.  Relationships 
between  different  ST  events  are  vital  to  increasing  knowl¬ 
edge  and  understanding  of  military  challenges  such  as  dis¬ 
covering  and  predicting  insurgent  attack  patterns  and  tac¬ 
tics,  and  understanding  the  nature  of  asymmetric  warfare. 
As  a  result,  ST  co-occurrence  pattern  mining  has  been  the 


subject  of  much  recent  research. 

Given  a  moving  object  database,  our  aim  is  to  discover 
mixed-drove  ST  co-occurrence  patterns  (MDCOPs)  repre¬ 
senting  subsets  of  different  object- types  whose  instances 
are  located  close  together  in  geographic  space  for  a  signif¬ 
icant  fraction  of  time.  Unlike  the  objectives  of  some  other 
ST  co-occurrence  pattern  identification  approaches  where 
the  pattern  is  the  primary  interest,  in  MDCOPs  both  the 
pattern  and  the  nature  of  the  different  object-types  are  of 
interest. 

An  example  of  an  MDCOP  is  in  American  football 
where  two  teams  try  to  outscore  each  other  by  moving  a 
football  to  the  opponent’s  end  of  the  field.  Various  com¬ 
plex  interactions  occur  within  and  across  teams  to  achieve 
this  goal.  These  interactions  involve  intentional  and  ac¬ 
cidental  MDCOPs,  the  identification  of  which  may  help 
teams  to  study  their  opponent’s  tactics.  Object- types  may 
be  defined  by  the  roles  of  the  offensive  and  defensive 
players,  such  as  quarterback,  running  back,  wide  receiver, 
kicker,  holder,  linebacker,  and  cornerback.  An  MDCOP 
is  a  subset  of  these  object-types  (such  as  {kicker,  holder} 
or  {wide .receiver,  cornerback})  that  occur  frequently.  One 
example  MDCOP  involves  offensive  wide  receivers,  defen¬ 
sive  linebackers,  and  defensive  cornerbacks,  and  is  called 
a  Broken  Blitz  play.  In  this  play,  the  objective  of  the  of¬ 
fensive  wide  receivers  is  to  outrun  any  linebackers  and 
defensive  backs  and  get  behind  them,  catching  an  unde¬ 
fended  pass  and  hopefully  running  untouched  for  a  touch¬ 
down.  This  interaction  creates  an  MDCOP  between  wide 
receivers  and  cornerbacks.  An  example  Broken  Blitz  play 
is  given  in  Fig.  1.  It  shows  the  positions  of  four  offensive 
wide  receivers  (W.l,  W.2,  W.3,  and  W.4),  two  defensive 
cornerbacks  (C.l  and  C.2),  two  defensive  linebackers  (L.l 
and  L.2),  and  a  quarterback  (Q.l)  in  four  time  slots.  The 
solid  lines  between  the  players  show  the  neighboring  play¬ 
ers.  The  wide  receivers  W.l  and  W.4  cross  over  each  other 
and  the  wide  receivers  W.2  and  W.3  run  directly  to  the  end 
zone  of  the  field.  Initially,  the  wide  receivers  W.  1  and  W.4 
are  co-located  with  cornerbacks  C.l  and  C.2  and  the  wide 
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Figure  1.  An  example  Broken  Blitz  play  in  American  football 


receivers  W.2  and  W.3  are  co-located  with  linebackers  L.l 
and  L.2  at  time  slot  t=0  (Fig.  la).  In  time  slot  t=l,  the  four 
wide  receivers  begin  to  run,  while  the  linebackers  run  to¬ 
wards  the  quarterback  and  the  cornerbacks  remain  in  their 
original  position,  possibly  due  to  a  fake  handoff  from  the 
quarterback  to  the  running  back  (Fig.  lb).  In  time  slot  t=2, 
the  wide  receivers  W.  1  and  W.4  cross  over  each  other  and 
try  to  drift  further  away  from  their  respective  cornerbacks 
(Fig.  lc).  When  the  quarterback  shows  signs  of  throwing 
the  football,  both  cornerbacks  and  linebackers  run  to  their 
respective  wide  receivers  (Fig.  Id).  The  overall  sketch  of 
the  game  tactics  can  be  seen  in  Fig.  le.  In  this  exam¬ 
ple,  wide  receivers  and  cornerbacks  form  an  MDCOP  since 
they  are  persistent  over  time  and  they  occur  in  2  out  of  4 
time  slots.  However,  wide  receivers  and  linebackers  do  not 
form  an  MDCOP  due  to  the  lack  of  temporal  persistence. 

Other  applications  for  which  discovering  co-occurring 
patterns  of  specific  combinations  of  object-types  is  im¬ 
portant  include  battlefield  planning  and  strategy,  ecology 
(tracking  species  and  pollutant  movements),  homeland  de¬ 
fense  (looking  for  significant  ’’events”),  and  transportation 
(road  and  network  planning)  [8,  11]. 

However,  discovering  MDCOPs  poses  several  non¬ 
trivial  challenges.  First,  current  interest  measures  (i.e.  the 
spatial  prevalence  measure)  are  not  sufficient  to  quantify 
such  patterns,  so  new  composite  interest  measures  must 
be  created  and  formalized  [9].  Second,  the  set  of  can¬ 
didate  patterns  grows  exponentially  with  the  number  of 
object- types.  Finally,  since  spatio-temporal  datasets  are 
huge,  computationally  efficient  algorithms  must  be  devel¬ 
oped  [16]. 

This  paper  focuses  on  MDCOPs  (typed  collections  of 
moving  objects)  by  extending  interest  measures  for  spatial 
co-location  patterns  given  a  user-defined  participation  in¬ 
dex  threshold  [9].  The  following  issues  are  beyond  the 
scope  of  this  paper:  (i)  determining  thresholds  for  MD¬ 
COP  interest  measures;  (ii)  similarity  measures  for  track¬ 
ing  moving  objects  ;  (iii)  indexing  and  query  processing 
issues  related  to  mining  objects;  and  (iv)  discovering  mul¬ 


tisets  (e.g.{A,  A,  B}). 

2.  Related  Work 

Data  analysis  can  be  broadly  categorized  into  statisti¬ 
cal  approaches  and  data  mining  approaches.  In  statisti¬ 
cal  approaches,  there  are  bodies  of  work  in  both  spatial 
and  temporal  analysis.  Spatial  point  patterns  are  often  de¬ 
scribed  by  metrics  such  as  the  intensity  function  and  Rip¬ 
ley’s  K  [14,  15].  Other  measures  such  as  complete  spa¬ 
tial  randomness  (CSR)  and  spatial  covariance  functions  are 
used  to  describe  the  spatial  relationships  of  adjacent  areas 
and  continuous  variables  as  random  fields  [5].  Temporal 
patterns  have  been  extensively  studied  in  models  such  as 
moving  averages,  first  and  second  order  autoregression,  in¬ 
tegration,  seasonality,  and  cointegration  [18],  [6].  There 
has  also  been  some  recent  research  in  combining  spatial 
and  temporal  analysis,  such  as  Brix  and  Diggle’s  extended 
intensity  function  and  the  extended  K  (r,  t)  function  [1,13]. 
Most  attempts  to  combine  the  fields  suffer  from  limitations 
such  as  the  inability  to  model  space-time  interactions,  as¬ 
suming  separability  and  independence  between  space  and 
time  [15].  Statistical  research  specifically  focused  on  ST 
co-occurrence  patterns  and  their  possible  interactions  has 
been  limited. 

Previous  data  mining  studies  for  mining  ST  co¬ 
occurrence  patterns  can  be  classified  into  two  categories: 
mining  of  uniform  groups  of  moving  objects,  and  mining 
of  mixed  groups  of  moving  objects. 

To  mine  uniform  groups  of  moving  objects,  the  prob¬ 
lems  of  discovering  flock  patterns  [12,  7]  and  moving  clus¬ 
ters  [10]  are  defined.  A  flock  pattern  is  a  moving  group 
of  the  same  kind  of  objects,  such  as  a  sheep  flock  or  a 
bird  flock.  Gudmundsson  et  al.  proposed  algorithms  for 
detection  of  the  flock  pattern  in  ST  datasets  [7].  Kalnis 
et  al.  defined  the  problem  of  discovering  moving  clus¬ 
ters  and  proposed  clustering-based  methods  to  mine  such 
patterns  [10].  In  this  approach,  if  there  is  a  large  enough 
number  of  common  objects  between  clusters  in  consecutive 
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time  slots,  such  clusters  are  called  moving  clusters.  These 
methods  do  not  take  object- types  into  account,  and  thus 
are  not  effective  for  mining  MDCOPs  [4].  To  mine  mixed 
groups  of  moving  objects,  the  problems  of  discovering  col¬ 
location  episodes  [3]  and  topological  patterns  [17]  are  im¬ 
portant.  Both  generalize  co-location  patterns  [9]  to  the  ST 
domain.  A  collocation  episode  is  a  sequence  of  co-location 
patterns  with  some  common  object-types  across  consecu¬ 
tive  time  slots.  However,  if  there  is  no  common  object- type 
in  consecutive  time  slots,  the  proposed  approach  will  not 
identify  any  pattern.  For  example,  if  the  window  length 
is  2,  the  collocation  episodes  algorithm  will  not  be  able 
to  find  any  pattern  from  the  dataset  given  in  Fig.  1.  The 
algorithm  tries  to  find  co-location  patterns  that  are  persis¬ 
tent  in  2  consecutive  time  slots,  but  there  is  no  such  pattern 
in  the  dataset  because  wide  receivers  and  cornerbacks  are 
forming  co-locations  in  time  slots  t=0  and  t=3  and  wide 
receivers  and  linebackers  are  forming  co-locations  in  time 
slots  t=0.  Thus,  there  may  not  be  any  co-location  patterns 
and  collocation  episodes  identified  in  the  dataset. 

A  topological  pattern  [17]  is  a  subset  of  object-types 
whose  instances  are  close  in  space  and  time.  An  interest 
measure  for  a  topological  pattern  {A,B}  (e.g.  participation 
index  or  support)  is  a  ST  join  of  instances  of  A  and  in¬ 
stances  of  B  [9].  This  statistic  may  be  high  even  if  many 
instances  of  A  and  many  instances  of  B  are  not  spatially  to¬ 
gether  for  a  moment  in  time.  The  semantics  of  topological 
patterns  are  not  well-defined  for  moving  objects.  For  ex¬ 
ample,  this  approach  can  not  determine  the  fraction  of  time 
that  a  pattern  occurs.  This  approach  may  not  be  able  to  tell 
in  which  time  slots  a  pattern  occurs,  since  there  is  no  time 
slot  notion.  In  the  dataset  given  in  Fig.  1,  this  approach  will 
discover  the  two  patterns  of  {W,C},  {W,L},  {W,C},  and 
{W,C,S}.  Both  patterns  have  the  same  support,  but  pattern 
{W,  C}  occurs  in  2  time  slots  out  of  4  (a  persistent  pattern) 
and  patterns  {W,L},  {W,C},  and  {W,C,S}  occur  in  1  time 
slot  out  of  4  (a  transient  pattern)  since  tracks  of  objects  are 
represented  as  ST  instances.  The  persistent  pattern  {W,  C} 
occurs  in  time  slots  t=0  and  t=3  and  its  instances  {Wl,  Cl} 
and  {W4,  C2}  occur  in  time  slot  t=0  and  {Wl,  C2}  and 
{W4,  Cl}  in  time  slot  t=l.  The  transient  pattern  {W,  L} 
occurs  in  time  slot  t=0  and  its  instances  {W2,  LI},  {W3, 
LI},  {W2,  L2},  and  {W3,  L2}  occur  in  time  slot  t=0. 

In  contrast,  our  proposed  interest  measure  and  algo¬ 
rithms  will  efficiently  mine  mixed  groups  of  objects  (e.g 
MDCOPs)  which  are  close  in  space  and  persistent  in  time. 
Unlike  the  techniques  just  described,  our  approach  discov¬ 
ers  persistent  patterns  that  co-occur  in  most  but  not  all 
ST  intervals;  consecutive  co-occurrences  are  not  manda¬ 
tory.  For  example,  our  approach  will  find  the  MDCOP 
{wide_receiver,  cornerback}  pattern  in  Fig.  1 ,  if  the  frac¬ 
tion  of  time  slots  where  the  pattern  occurs  over  the  total 
number  of  time  slots  is  no  less  than  the  threshold  0.5,  since 
instances  are  co-located  in  2  time  slots  out  of  4.  It  will  re¬ 
ject  the  patterns  {W,L},  {W,C},  and  {W,C,S}  in  Fig.  1  at 


the  same  threshold,  since  they  are  co-located  on  only  1  time 
slot  out  of  4. 

3.  Basic  Concepts  &  Problem  Definition 

3.1  Spatial  Prevalence  Measure 

Spatial  co-location  mining  algorithms  are  used  to  dis¬ 
cover  sets  of  mixed  object- types  that  are  frequently  located 
together  in  a  spatial  framework  for  a  given  set  of  spatial 
object- types,  their  instances,  and  a  spatial  neighbor  rela¬ 
tionship  R  [9].  For  example,  in  Fig.  2,  in  time  slot  t=0, 
{A.l,  C.l}  is  an  instance  of  a  co-location  if  the  distance 
between  the  objects  is  no  more  than  a  given  neighborhood 
distance  threshold.  In  Fig.  2,  the  solid  lines  show  the  dis¬ 
tance  between  the  objects  that  satisfies  the  neighborhood 
distance  threshold.  The  participation  index  is  used  to  de¬ 
termine  the  strength  of  the  co-location  pattern.  If  the  index 
is  greater  than  or  equal  to  a  threshold  [9],  a  co-location 
is  called  spatial  prevalent.  The  participation  index  is  de¬ 
fined  as  the  minimum  of  the  participation  ratios  (the  frac¬ 
tion  of  the  number  of  instances  of  object- types  forming  co- 
location  instances  to  the  total  number  of  instances).  For 
example,  in  Fig.  2,  {A,  B}  is  a  co-location  in  time  slot 
t=0,  and  its  instances  are  {A.l,  B.l},  {A.2,  B.l},  {A.3, 
B.2},  and  {A.3,  B.3}.  In  the  dataset,  object-type  A  has  4 
instances  and  three  of  them  (A.l,  A.2,  and  A.3)  are  con¬ 
tributing  to  the  co-location  {A,  B},  so  the  participation  ra¬ 
tio  of  A  is  3/4.  The  participation  ratio  of  B  is  3/5  since  3 
out  of  5  instances  are  contributing  to  the  co-location  {A, 
B}.  The  participation  index  of  the  co-location  {A,  B}  is 
the  minimun  of  4/5  and  3/5,  or  3/5.  It  has  been  shown  that 
the  participation  index  is  anti-monotone  in  the  size  of  co- 
locations  [9].  In  other  words,  participation  Jndex(Pj)  < 
participation -index  (Pi)  if  Pi  is  a  subset  of  Pj.  In  addi¬ 
tion,  the  participation  index  has  a  spatial  statistical  inter¬ 
pretation  as  an  upper  bound  on  the  cross  —  K  function  [5]. 

3.2  Modeling  MDCOPs 

Given  a  set  of  spatio-temporal  mixed  object- types  and 
a  set  of  their  instances  with  a  neighborhood  relation  R ,  an 
MDCOP  is  a  subset  of  spatio-temporal  mixed  object- types 
whose  instances  are  neighbors  in  space  and  time. 

Definition  3.1  Given  a  spatio-temporal  pattern  and  a  set 
TF  of  time  slots ,  such  that  TF  =  [To, Tn_i],  the  time 
prevalence  or  persistence  measure  of  the  pattern  is  the  frac¬ 
tion  of  time  slots  where  the  pattern  occurs  over  the  total 
number  of  time  slots. 

For  example,  in  Fig.  2,  the  total  number  of  time  slots 
is  4  and  pattern  {A,  B}  occurs  in  all  4  time  slots,  so  its 
time  prevalence  index  is  4/4.  Pattern  {A,  C}  occurs  in  3 
time  slots,  namely,  time  slots  t=0,  t=l,  and  t=2,  and  its  time 
prevalence  index  is  3/4  (Fig.  2b). 
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(a)  An  input  spatio-temporal  dataset 
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(b)  A  set  of  output  mixed-drove  spatio-temporal  co-occurrence  patterns 


Figure  2.  An  input  spatio-temporal  dataset  and  a  set  of  output  MDCOPs 


Definition  3.2  Given  a  spatio-temporal  dataset  of  mixed 
object-types  ST,  and  a  spatial  prevalence  threshold  0P,  the 
mixed-drove  prevalence  measure  of  pattern  Pi  is  a  com¬ 
position  of  the  spatial  prevalence  and  the  time  prevalence 
measures  as  shown  below. 

P robtrneTF(s-prev(Pi,  timeslot  tm)  >  0P),  (1) 

where  Prob  stands  for  probability  of  overall  prevalence 
time  slots  and  s.prev  stands  for  spatial  prevalence,  e.g., 
the  participation  index,  described  in  Section  3.1. 

Definition  3.3  Given  a  spatio-temporal  dataset  of  mixed 
object-types  ST  and  a  threshold pair(Op,  Otime)>  MDCOP 
Pi  is  a  mixed-drove  prevalent  pattern  if  its  mixed-drove 
prevalence  measure  satisfies  the  following. 


P robtrneTF  [s-prev (Pi,  timeslot  tm)  >  9P]  >  Otime, 

(2) 

where  Prob  stands  for  probability  of  overall  prevalence 
time  slots,  S-prev  stands  for  spatial  prevalence,  0P  is  the 
spatial  prevalence  threshold,  and  Oume  is  the  time  preva¬ 
lence  threshold. 

For  example,  in  Fig.  2,  {A,  B}  is  an  MDCOP  because  it 
is  spatial  prevalent  in  time  slots  t=0,  t=l,  t=2,  and  t=3  since 
its  participation  indices  are  no  less  than  the  given  thresh¬ 
old  0.4  in  these  time  slots,  and  is  time  prevalent  since  its 
time  prevalence  index  of  1  is  above  the  threshold  0.5.  In 
contrast,  {B,  D}  is  not  an  MDCOP.  Although  it  is  spatial 


prevalent  in  time  slot  t=2,  it  is  not  time  prevalent  since  its 
time  prevalence  index  is  no  more  than  the  given  time  preva¬ 
lence  threshold  0.5. 

3.3  Problem  statement 

Given: 

-  A  set  P  of  Boolean  spatio-temporal  object- types  over 
a  common  spatio-temporal  framework  STF. 

-  A  neighbor  relation  R  over  locations. 

-  A  spatial  prevalence  threshold,  Op  . 

-  A  time  prevalence  threshold,  Oume  • 

Find:  {Pi  \Pi  is  a  subset  of  P  and  Pi  is  a  prevalent  MDCOP 
as  in  Definition  3.3}. 

Objective:  Minimize  computation  cost. 

Constraints:  To  find  a  correct  and  complete  set  of  MD¬ 
COPs. 

Threshold  values  selected  for  MDCOP  interest  mea¬ 
sures  (the  spatial  and  time  prevalence  measures)  have  im¬ 
portant  implications  on  the  mining  processes  and  results. 
Selection  of  a  small  interest  measure  threshold  (close  to  0) 
increases  the  algorithm’s  computational  complexity  and  the 
number  of  generated  prevalent  patterns.  This  may  cause 
generation  of  insignificant  patterns.  Selection  of  a  large 
interest  measure  threshold  (close  to  1)  decreases  the  com¬ 
putational  complexity  of  the  algorithms  and  the  number  of 
prevalent  patterns.  This  may  cause  pruning  of  some  of  the 
significant  patterns.  Nevertheless  the  selection  of  interest 
measure  threshold  values  is  dependent  on  the  application 
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(a)  Naive  Approach 


(b)  MDCOP-Miner 


(c)  FastMDCOP-Miner 


Figure  3.  Comparison  of  Naive  Approach,  MDCOP-Miner,  and  FastMDCOP-Miner 


and/or  purpose  of  the  analysis. 

4.  Mining  MDCOPs 

In  this  section,  we  discuss  a  naive  approach  and  then 
propose  two  novel  MDCOP  mining  algorithms  -  MDCOP- 
Miner  and  FastMDCOP-Miner  -  to  mine  MDCOPs. 

4.1  Naive  approach 

A  naive  approach  can  use  a  spatial  co-location  min¬ 
ing  algorithm  for  each  time  slot  to  find  spatial  prevalent 
co-locations  and  then  apply  a  post-processing  step  to  dis¬ 
cover  MDCOPs  by  checking  their  time  prevalence.  To 
mine  co-locations,  Huang,  Shekhar  and  Xiong  proposed  a 
join-based  approach,  Yoo,  Shekhar  and  Celik  proposed  a 
partial  join-based  approach  and  a  join-less  approach,  and 
Zhang  et  al.  proposed  a  multi-way  spatial  join-based  ap¬ 
proach  [2,  9,  19,  20].  This  study  will  be  based  on  the  join- 
based  co-location  algorithm  proposed  by  Huang  et  al.,  but 
it  is  also  possible  to  use  other  approaches.  The  naive  ap¬ 
proach  will  generate  size  k  - hi  candidate  co-locations  for 
each  time  slot  using  spatial  prevalent  size  k  subclasses  until 
there  are  no  more  candidates.  After  finding  all  size  spatial 
prevalent  co-locations  in  each  time  slot,  a  post-processing 
step  can  be  used  to  discover  MDCOPs  by  pruning  out  time 
non-prevalent  co-locations.  Even  though  this  approach  will 
prune  out  spatial  non-prevalent  co-locations  early,  it  will 
not  prune  out  time  non-prevalent  MDCOPs  before  the  post¬ 
processing  step  (Fig.  3a).  This  leads  to  unnecessary  com¬ 
putational  cost. 

4.2  MDCOP-Miner 

To  eliminate  the  drawbacks  of  the  Naive  approach,  we 
propose  an  MDCOP  mining  algorithm  (MDCOP-Miner) 
which  incorporates  a  time-prevalence  based  filtering  step 


in  each  iteration.  The  algorithm  will  first  discover  all 
size  k  spatial  prevalent  MDCOPs,  and  then  will  apply  a 
time-prevalence  based  filter  to  discover  MDCOPS.  Finally, 
the  algorithm  will  generate  size  k  +  1  candidate  MDCOPs 
using  size  k  MDCOPs  (Fig.  3b).  The  participation  index 
is  used  as  a  spatial  prevalence  interest  measure  to  check  if 
the  pattern  is  spatial  prevalent  at  a  time  slot  [9] .  The  time 
prevalence  from  Definition  3.1  is  used  as  a  time  prevalence 
interest  measure.  Algorithm  1  gives  the  pseudo  code  of  the 
both  the  MDCOP-Miner  algorithm  and  the  FastMDCOP- 
Miner  discussed  in  the  next  section.  The  choice  of  the 
algorithm  is  provided  by  the  user.  The  inputs  are  algorithm 
choice  alg .choice  with  value  MDCOP-Miner ,  a  set  of  dis¬ 
tinct  spatial  object-types  1 E,  a  spatio-temporal  dataset  ST, 
a  spatial  neighborhood  relationship  R,  and  thresholds  of 
interest  measures,  i.e.  spatial  prevalence  and  time  preva¬ 
lence;  the  output  is  a  set  of  MDCOPs.  In  the  algorithm, 
steps  1  include  initialization  of  the  parameters,  steps  2 
through  14  give  an  iterative  process  to  mine  MDCOPs,  and 
step  15  gives  a  union  of  the  results.  Steps  2  through  14  con¬ 
tinue  until  there  are  no  candidate  MDCOPs  to  be  generated. 


4.3  FastMDCOP-Miner 

In  this  section,  we  discuss  the  FastMDCOP-Miner 
algorithm,  which  further  improves  on  the  computational 
efficiency  of  the  MDCOP-Miner  discussed  in  Section  4.2. 
As  can  be  seen  in  Fig.  3b  and  in  Algorithm  1,  MDCOP- 
Miner  waits  to  prune  time  non-prevalent  patterns  until 
all  size  k  spatial  prevalent  patterns  are  generated  for  all 
time  slots  and  then  prunes  time  non-prevalent  patterns  to 
discover  MDCOPs.  However,  we  can  further  optimize 
the  MDCOP-Miner  by  pruning  time-non  prevalent  pat¬ 
terns  at  an  earlier  stage.  We  move  ’’prune  non-prevalent 
patterns”  between  the  time  slots  shown  in  Fig.  3c  where 
the  candidate  size  2  pattern  generation  is  illustrated.  The 
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Algorithm  l:MDCOP-Miner  and  FastMDCOP-Miner 


Inputs : 

alg-choice :  MDCOP-Miner  or  FastMDCOP-Miner 

E:  a  set  of  distinct  spatial  object-types 

ST :  a  spatio-temporal  dataset 

<ob  ject.type,  object-id,  x,  y,  time  slot> 

R:  spatial  neighborhood  relationship 

TF :  a  time  slot  frame  {to,  •  •  • ,  tn— 1} 

Op  :  a  spatial  prevalence  threshold 
Otime  -  a  time  prevalence  threshold 
Output  :  MDCOPs  whose  spatial  prevalence 

indices  are  no  less  than  0pr  for  time  prevalence 
indices  are  no  less  than  Otime  - 
Variables : 

k:  co-occurrence  size 

t :  time  slots  (0, . . . ,  n  —  1) 

Tk:  set  of  instances  of  size  k  co-occurrences 

Ck :  set  of  candidate  size  k  co-occurrences 

SPk :  set  of  spatial  prevalent  size  k 

co-occurrences 

TPk:  set  of  time  prevalent  size  k 

co-occurrences 

MDPk:  set  of  mixed-drove  size  k 

co-occurrences 

Algorithm : 

1)  initialization  :  k  =  1,  Ck  =  E,  MDPk(0)  =  ST 

2)  while  (not  empty  MDPk)  { 

3 )  Cfc_|_i(0)  =  gen  .candidate -Co  —  occ(Ck ,  MDP k) 

4)  for  each  time.slot  t  in  (0,  ...,n  — 1)  { 

5)  Tfc_|_i(£)  =  gen-CO-OCC-inst(Ck+i(t),  T/g(£),  i7) 

6)  =  findspatial-prev-CO-Occ(Tk+i(t),  Ck+i(t),  0P) 

7)  If  {alg  .choice  ==" FastMDCOP-Miner")  { 

8)  TPfc_|_i(t)  =  find-timc-indcx(SPk-\-i(t) ) 

9 )  MDPk+ 1  (t)  =  find-time-prev-CO-Occ(TPk+i  ( t ) ,  6>^me ) 

10)  CHi(t)  =  MDPHi(t)  }  } 

11)  If  alg.choice==  "MDCOP-Miner "  { 

12)  TP/g+i  =  find-time-index(SPk+i) 

13)  MDPk+i  =  find-time-prev-CO-Occ(TPk+i,  Otime)  } 

14)  k  =  k  +  1  } 

15)  return  union  {MDP2, . . . ,  ) 


pseudo-code  of  the  FastMDCOP-Miner  is  also  given  in 
Algorithm  1 .  When  the  FastMDCOP-Miner  is  chosen,  the 
algorithm  will  activate  steps  8,  9,  and  10  and  deactivate 
steps  12  and  13.  This  will  allow  the  algorithm  to  check 
the  time  prevalence  of  a  pattern  after  every  time  slot  is 
processed.  The  functions  of  the  algorithm  are  as  described 
in  Section  4.2.  In  step  8,  FastMDCOP-Miner  checks 
whether  the  time  prevalence  indices  of  size  k  patterns 
(size  2  patterns  in  Fig.  3c)  satisfy  the  time  prevalence 
threshold  before  generating  size  k  patterns  for  the  next 
time  slot.  Early  discovered  time  non-prevalent  patterns  are 
pruned  in  Step  9  and  time  prevalent  patterns  are  used  as 
candidate  co-occurrences  (Step  10)  in  the  next  time  slot. 
For  example,  assume  that  there  are  10  time  slots  and  the 
time  prevalence  threshold  is  0.5.  In  this  case,  a  size  k 
pattern  should  be  present  for  at  least  5  time  slots  to  satisfy 
the  threshold.  If  the  time  prevalence  index  of  a  pattern 
is  0  for  the  first  (or  any)  6  time  slots,  there  is  no  need  to 
generate  it  and  check  its  prevalence  for  the  rest  of  the  time 
slots,  since  it  will  now  be  impossible  for  it  to  satisfy  the 
given  threshold  regardless  of  the  remaining  results. 


5.  Experimental  Evaluation 

In  this  section,  we  present  our  experimental  evaluations 
of  several  design  decisions  and  workload  parameters  on  our 
MDCOP  mining  algorithms.  We  used  a  real-world  training 
dataset.  Experiments  were  conducted  on  an  IBM  Netinfin- 
ity  Linux  Cluster,  2.6  GHz  Intel  Pentium  4  with  1.5  GB 
of  RAM.  We  evaluated  the  behavior  of  the  FastMDCOP- 
Miner,  MDCOP-Miner  and  naive  approach  to  answer  the 
following  questions: 

-  What  is  the  effect  of  the  number  of  timeslots? 

-  What  is  the  effect  of  the  number  of  object- types? 

-  What  is  the  effect  of  the  spatial  prevalence  threshold? 

-  What  is  the  effect  of  the  time  prevalence  threshold? 

Dataset:  The  real  dataset  contains  the  location  and  time 

information  of  moving  objects  (Figure  4).  It  includes  15 
time  snapshots  and  22  distinct  vehicle  types  and  their  in¬ 
stances.  The  minimum  instance  number  is  2,  the  maximum 
instance  number  is  78,  and  the  average  number  of  instances 
is  19. 


Figure  4.  Real  Dataset 


1.  Effect  of  Number  of  Time  Slots:  We  evaluated  the 
effect  of  the  number  of  time  slots  on  the  execution  time 
of  the  MDCOP  algorithms  using  the  real  dataset.  The 
participation  index,  time  prevalence  index,  and  distance 
were  set  at  0.2,  0.8,  and  150m  respectively.  Experiments 
were  run  for  a  minimum  of  1  time  slot  and  a  maximum  of 
14  time  slots.  Results  showed  that  the  FastMDCOP-Miner 
requires  less  execution  time  than  the  MDCOP-Miner  and 
naive  approaches,  since  it  prunes  out  time  non-prevalent 
MDCOPs  as  early  as  possible  (Fig.  5a).  As  the  number  of 
time  slots  increases,  the  ratio  of  the  increase  in  execution 
time  is  smaller  for  FastMDCOP-Miner  than  for  the  other 
approaches.  Fig.  5b  shows  the  number  of  generated  size 
2  and  size  3  instances  for  algorithms.  The  FastMDCOP- 
Miner  generates  fewer  patterns.  The  MDCOP-Miner  and 
naive  approaches  generate  the  same  number  of  size  2 
instances. 

2.  Effect  of  Number  of  Object-types:  We  evaluated  the 
effect  of  the  number  of  object- types  on  the  execution  time 
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Number  of  time  slots 


Number  of  time  slots 


(a)  Execution  time 


(b)  Generated  instances 


same  number  of  spatial  prevalent  patterns  as  the  time  preva¬ 
lence  index  increases.  In  that  case,  the  cost  of  the  post¬ 
processing  step  will  reflect  the  trend  of  the  naive  approach. 
Experimental  results  show  that  the  FastMDCOP-Miner  is 
more  computationally  efficient  than  the  other  approaches 
(Fig.  7a-b).  The  execution  times  of  the  FastMDCOP-Miner 
and  MDCOP-Miner  decrease  as  the  time  prevalence  thresh¬ 
old  increases.  It  is  also  observed  that  the  naive  approach 
is  computationally  more  expensive  as  the  time  prevalence 
threshold  decreases  because  of  the  increase  in  the  number 
of  MDCOPs  to  be  discovered. 


Figure  5.  Effect  of  number  of  time  slots 

of  the  algorithms  using  the  real  dataset.  The  participation 
index,  time  prevalence  index,  number  of  time  slots  and 
distance  were  set  at  0.2,  0.8,  15,  and  150m  respectively. 
Results  showed  that  the  FastMDCOP-Miner  outperforms 
the  other  approaches  as  the  number  of  object- types  in¬ 
creases  (Fig.  6a-b).  It  is  observed  that  the  increase  in 
execution  time  for  the  naive  approach  is  bigger  than  that 
of  the  MDCOP-Miner  and  the  FastMDCOP-Miner  as  the 
number  of  object-types  increases  for  datasets. 


Number  of  object-types  Number  of  object-types 


(a)  Execution  time  (b)  Generated  instances 

Figure  6.  Effect  of  number  of  object-types 

The  execution  times  of  the  algorithms  increases  as  the 
number  of  the  object- types  increases  due  to  the  increase 
in  the  number  of  join  operations.  The  MDCOP-Miner 
and  naive  approaches  generate  the  same  number  of  size  2 
instances.  In  contrast,  the  FastMDCOP-Miner  generates 
fewer  size  2  instances  (Fig.  6b). 

3.  Effect  of  the  Time  Prevalence  Threshold:  We  eval¬ 
uated  the  effect  of  the  time  prevalence  threshold  on  the 
execution  times  of  the  MDCOP  mining  algorithms  for  the 
real  dataset.  The  fixed  parameters  were  participation  index, 
number  of  time  slots,  and  distance,  and  their  values  were 
0.2, 15,  and  150m  respectively.  For  the  naive  approach,  the 
effective  cost  in  execution  time  to  generate  spatial  preva¬ 
lent  co-locations  will  be  constant  since  it  generates  the 


Time  prevalence  index  threshold  Time  prevalence  index  threshold 


(a)  Execution  time  (b)  Generated  instances 

Figure  7.  Effect  of  the  time  prevalence 
threshold 

4.  Effect  of  the  Spatial  Prevalence  Threshold:We  eval¬ 
uated  the  effect  of  the  spatial  prevalence  threshold  on  the 
execution  times  of  the  MDCOP  algorithms.  The  fixed  pa¬ 
rameters  were  time  prevalence  index,  number  of  time  slots, 
and  distance,  with  values  of  0.2,  15,  and  100m  respec¬ 
tively.  Fig.  8a  shows  the  execution  times  of  the  algo¬ 
rithms  and  Fig.  8b  shows  the  number  of  generated  size  2 
and  3  instances  for  the  algorithms.  FastMDCOP-Miner  and 
MDCOP-Miner  do  not  generate  more  than  size  3  instances 
for  a  spatial  prevalence  threshold  of  greater  than  0.2.  The 
FastMDCOP-Miner  outperforms  the  MDCOP-Miner  and 
naive  approaches  as  the  spatial  prevalence  threshold  in¬ 
creases  (Fig.  8a-b).  The  cost  of  the  naive  approach  is 
higher  than  that  of  the  FastMDCOP-Miner  and  MDCOP- 
Miner  for  low  values  of  the  spatial  prevalence  threshold. 

6.  Conclusions  and  Future  Work 

We  defined  mixed-drove  spatio-temporal  co-occurrence 
patterns  (MDCOPs)  and  the  MDCOP  mining  problem  and 
proposed  a  new  monotonic  composite  interest  measure 
which  is  the  composition  of  distinct  object-types,  spatial 
prevalence  measures,  and  time  prevalence  measures.  We 
presented  two  novel  and  computationally  efficient  algo¬ 
rithms  for  mining  these  patterns:  the  MDCOP-Miner,  and 
the  FastMDCOP-Miner.  We  compared  our  algorithms  with 
a  naive  approach  and  showed  their  superiority  in  experi- 
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Figure  8.  Effect  of  the  spatial  prevalence 
threshold 


ments  using  a  real  dataset  to  examine  the  effects  of  the 
number  of  time  slots,  the  number  of  object-types,  and  the 
values  of  the  spatial  and  time  prevalance  thresholds.  The 
two  new  proposed  algorithms  are  correct  and  complete  in 
finding  mixed-drove  prevalent  patterns. 

For  future  work,  we  would  like  to  explore  the  relation¬ 
ship  between  the  proposed  MDCOP  interest  measures  and 
spatio-temporal  statistical  measures  of  interaction  [2] .  An¬ 
other  problem  of  interest  is  the  characterization  of  the  prob¬ 
ability  distribution  of  the  proposed  interest  measure  to  help 
choose  thresholds  in  the  proposed  measures.  We  plan  to 
explore  other  potential  interest  measures  for  MDCOPs  by 
evaluating  similarity  measures  for  tracks  of  moving  ob¬ 
jects.  We  plan  to  investigate  new  monotonic  composite 
interest  measures  and  develop  other  new  computationally 
efficient  algorithms  for  mining  MDCOPs.  We  also  hope  to 
extend  our  algorithm  to  mine  newly  defined  patterns  in  the 
literature  such  as  leadership,  convergence  and  query  pro¬ 
cessing  [12]. 
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